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Abstract 

We investigate, by a systematic numerical study, the parameter depen- 
dence of the stability of the Kohonen Self-Organizing Map and the Zheng 
and Greenleaf concave and convex learning with respect to different input 
distributions, input and output dimensions. 

Topical groups: Advances in Neural Network Learning Methods, Neural and 
hybrid architectures and learning algorithms, Self-organization. 

Neural vector quantizers have become a widespreadly used tool to explore high- 
dimensional data sets by self-organized learning schemes. Compared to the vast 
literature on variants and applications that appeared the last two decades, the the- 
oretical description proceeded more slowly. Even for the coining Self-Organizing 
Map (SOM) still open questions remain, as a proper description of the dy- 
namics for the case of dimension reduction and varying data dimensionality, or 
the question for what parameters stability of the algorithm can be guaranteed. 
This paper is devoted to the latter question. The stability criteria are especially 
interesting for modifications and variants, as the concave and convex learning 
whose magnification behaviour has been discussed recently Especially for the 
variants, analytical progress becomes quite difficult, and in any case one will ex- 
pect that the stability will depend on the input distribution to some — apart from 
special cases — unknown extent. As the invariant density in general is analyti- 
cally unaccessible for input dimensions larger than one (see 0] for recent tractable 
cases), we expect a general theory not to be available immediately, and instead 
proceed with a systematic numerical exploration. 

The Kohonen SOM, and the nonlinear variant of Zheng and Greenleaf. - The 
class of algorithms investigated here is defined by the learning rule, that for each 
stimulus V G y each weight vector Wr is updated according to 



w 



°''+e-.9rs-(v-w°'^)^ (1) 

(grs being a gaussian function (width a) of euclidian distance |r — s| in the neural 
layer, thus describing the neural topology). Herein 

|ws - v| = minrgi^ |wr - v| (2) 

determines for each stimulus v the best-matching unit or winner neuron. 
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The case K = 1 is the original SOM [T" , corresponding to a hnear or Hebbian 
learning rule. The generalization to K or 1/K taking integer values has been pro- 
posed by Zheng and Greenleaf [2], but arbitrary nonzero real values of K can be 
used , and the choice of -ftT — > has been shown (for the onedimensional case) 
to have an invariant density with the information-theoretically optimal value of 
the magnification exponent one i.e., the neural density is proportional to the 
input density and hence can be used as a density estimator. 

Convergence and stability. - It is well known that for the learning rate e, one 
has to fulfill the Robbins-Munro conditions (see, e.g. 0]) to ensure convergence, 
with all other parameters fixed. However, practically it is necessary to use a large 
neighborhood width at the beginning, to have the network of weight vectors or- 
dered in input space, and decrease this width in the course of time downto a small 
value that ensures topology preservation during further on-line learning. Thus the 
situation becomes more involved when additionally also a is made time-dependent. 
Here we consider the strategy where the stability border in the (e, a) plane always 
is approached from small e with cr fixed during this final phase. An ordered state 
has to be generated by preceding learning phases. 

Measures for Topographical Stability. - To quantify the ordered state and the 
topology preservation, a variety of measures is used, e.g. the topographic product 
0, the Zrehen measure [H], and the average quadratic reconstruction error. To 
detect instable behaviour, all measures should be suitable and give similar results. 
For an unstable and disordered map, also the total sum over all (squared) distances 
between adjacent weight vectors will increase significantly; so a threshholded in- 
crease will indicate instability as well. This indicator is used below; however, for 
the case of a large neighborhood (of network size) , the weight vectors shrink to a 
small volume, thus influencing the results; however, this applies to a neighborhood 
widths larger than that commonly used for the pre-ordering. 

In addition we use here an even more simple approach than the Zrehen mea- 
sure (which counts the number of neurons that lie within a circle between each 
pair of neurons that are adjacent in the neural layer). For a mapping from d to d 
dimensions, we consider the determinant of the i vectors spanned by w+S; — wp, 
with ei being the i'^ unit vector. The sign of this determinant, where 1 < i < d, 
thus gives the orientation of the set of d vectors. Note that the 1-dimensional case 
just reads sgn(w,._|.i —Wr), which has been widely considered to detect the ordered 
state in the 1 to 1 dimensional case. Hence, we can define 



This evaluates the number of neurons 7V-|- (resp. where this sign is pos- 

itive (resp. negative), hence the relative fraction of minority signs is given by 
(1 — |7V+ — N^\/N). A typical single defect is shown in Fig.^ Due to its simplic- 
ity, this measure x will be used in the remainder. 



X{{wr}) ■■ 



(3) 
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Figure 1: Situation where one defect is detected by the crossproduct measure Q. 



Modification of learning rules and data representation. - A classical result 
[3 El El states that the neural density for 1-D SOM (in the continuum limes) 
approaches not the input density itself, but a power of it, with exponent 2/3, the 
so-called magnification exponent. As pointed out by Linsker Qf^, the case of an 
exponent 1 would correspond to the case of maximal mutual information between 
input and output space. Different modifications of the winner mechanism or the 
learning rule, by additive or multiplicative terms, have been suggested and influ- 
ence the magnification exponent [111 1121 1131 IT^ . Here we investigate the case of 
concave and convex learning OEIj which defines a nonlinear generalization of the 
SOM. 

Topographical Stability for the Self- Organizing Map. - Before investigating the 
case of concave and convex learning, the stability measures should be tested for 
the well-established SOM algorithm. Using the parameter path of Fig. |21 we first 
analyze the 2D 2D case, for three input distributions: the homogeneous input 
density (equidistribution) , an inhomogeneous input distribution ~ sin(7ra;i) |14j . 
and a varying-dimension dataset (Figs.O^. The results are shown in Fig. [SI 

Figure 2: Schematic diagram of the parameter 
path in (e, a) space. Starting with high val- 
ues, a is slowly decreased to the desired value, 
^ while the learning rate still is kept safely low. 
From there, at constant a the learning rate is 
increased until instability is observed; giving 
an upper border to the stability area. - The 
^ same scheme is applied for the concave and con- 
^ vex learning, where the nonlinearity exponent is 
considered as a fixed parameter. 

Different input dimensions and varying intrinsic dimension. - As the input di- 
mensionality is of pronounced influence on the maximal stable learning rate (Fig. 
EJ, we also investigate an artiflcial dataset combining different dimensions: the 
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box-plane-stick-loop dataset jJS] (Fig. or its 2D counterpart, the plane-stick- 
loop (Fig. 01). Here the crossproduct detection will become problematic where the 
input space is intrinsically ID (stick and loop), thus the average distance criterion 
is used, and we restrict to the case a < 1. 




Figure 3: Schematic view of the classical box-plane-stick-loop dataset. Its moti- 
vation is to combine locally different input data dimensions within one data set. 



Plane9ick-loop data input 




Figure 4: Part of the input data for the 2D plane-stick- loop data set (Fig. O. 
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Average distance based stability measures 

10° I ^ ^ ^ — ^ ^ ^ — 




10"' to"' 10° io' 



Figure 5: Critical emaxlc) where (coming from small e values, see Fig.|21l the SOM 
learning loses stability. Here a 2D array of 10x10 neurons was used with decay 
exp(— i/fc) exponentially in time t, with k between 30000 and 60000 depending 
on £o (for a between 0.1 and 0.001, k = 300000). Top: Unstable e detected from 
growth of the averaged distance of neurons; here a threshold of 15% was chosen. 
For large cr, this measure becomes less reliable due to shrinking of the network, 
i.e. ypwp — > (v). Bottom; Unstable e detected from the crossproduct measure, eq. 
(|3J), with threshold of 1 defect per 100 iterations. The e value depends on the data 
distribution, but the qualitative behaviour remains similar. In all cases, below a 
certain acrit of about 0.3, e has to be decreased significantly. This independently 
reproduces |16j . here we investigate also different e values. 
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Average distance based stability measures 



- 1D-1D 

- 2D-2D 

- 3D-3D 





Figure 6: Stability border dependence on input dimension (ID, 2D, 3D). The 
known ID case is included for comparison. Top: Using the average length criterion 
(for cr > 1, the result can be misleading due to total shrinking of the network, see 
text). Bottom: Using the crossproduct detection for defects, similar results are 
obtained; for large cr instabilities are detected earlier. 
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Concave and convex learning: Stability of nonlinear learning. - The simulation 
results are given in Fig.[71 Clearly, a strong influence of the nonlinearity parameter 
K is observed. Especially one has to take care when decreasing a, because for too 
large e the network becomes instable. For K < 1, much smaller values of e are 
possible, thus considerably longer learning phases have to be taken into account 
compared to original SOM. For K > 1 the stability range becomes larger. 




Figure 7: Critical Smax{o') for different values of the nonlinearity parameter from 
K — 2.0 (top) to if = (bottom). K — 1 corresponds to the SOM case. 
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Discussion. Wc have defined a standardized testbed for tlic stability analysis of 
SOM vector quantizers with serial pattern presentation, and compared the SOM 
with the recently introduced variants of concave and convex learning. The stability 
regions for different input distribution and dimension arc of the same shape, thus 
qualitatively similar, but not coinciding exactly. The neighborhood width, but 
unfortunately also the input distribution affect the maximal stable learning rate. 
For the concave and convex learning, the exponent steering the nonlinear learning 
also crucially influences the learning rate. In all plateau for a <C 1 is 

found where the learning rate must be quite low compared to the intermediate 
range 0.3 < u < 1. As a too safe choice of the learning rate simply increases 
computational cost, an accurate knowledge of the stability range of neural vector 
quantizers is of direct relevance in many applications. 
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